In the last class, we learned everything from basic concepts of R to
data wrangling. The current class focuses on communicating results using
graphical methods. For this, we will use the ggplot2
package, a popular and robust package for data visualization.
ggplot2 was developed by Hadley Wickham, Winston Chang,
Lionel Henry, Thomas Lin Pedersen, Kohske Takahashi, Claus Wilke, Kara
Woo, Hioraki Yutani, and Dewey Dunnington. It is included as a ‘core’
package of tidyverse. This means that ggplot2
follows the same core structure as tidyverse and that it is
installed with tidyverse. ggplot2 is one of
the most popular R packages because it can be used to produce a great
variety of quality plots. There are a number of packages that build on
ggplot2 functionality (also known as ‘extensions’) that we
will talk about later in this section.
The structure of ggplot2 was inspired by the book The
Grammar of Graphics, by Leland Wilkinson. One of the strengths of
ggplot2 is the ability to build on a basic plot by
adding new components as layers. This allows the user
flexibility in building graphics with their desired characteristics. A
base plot is created with ggplot(), and more
ggplot2 components can be added to modify the graph.
The most important components of ggplot2 graphs are:
aes()
arguments (short for ‘aesthetics’) to assign characteristics like color
and size to elements of your graphWe will use the example data from the last lesson to illustrate the
uses of these functions and components to you. We will also teach you
how to combine your graphs to create compelling figures, and to save
these figures to maintain their quality in manuscripts and
presentations. We will then briefly introduce you to some of
ggplot2’s many extension packages, which may allow you to
create an even greater variety of figures and custom features.
There are extension packages that expand the geometric elements and
other functions from ggplot2. You can check a gallery of
different extension following this link. A few
examples are shown below.
Examples of ggplot2 extension packages. From left to right:
ggwordcloud, ggridges, and
gggenes
First, we will load in ggplot2 and our example data from
the last lesson (Intro to R). Remember that ggplot2 is a
core package in tidyverse, so it can be loaded with
library("ggplot2") or library("tidyverse").
We’ll also include conflicted() since there are some
instances of different packages using identical functions.
# Load tidyverse (which includes ggplot2) and conflicted packages
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.4.4 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.0
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(conflicted)
# Tell R which version of `filter()` you want to use
conflicts_prefer(dplyr::filter)
## [conflicted] Will prefer dplyr::filter over any other package.
# Load in the example data and inspect it
data_demo <- readr::read_csv("data/intro_r/data_demo.csv")
## Rows: 64 Columns: 9
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): trt, var
## dbl (7): plot, blk, sev, inc, yld, don, fdk
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
data_demo
## # A tibble: 64 × 9
## plot trt var blk sev inc yld don fdk
## <dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 107 A R 1 1.95 35 86.6 0.11 4
## 2 109 A R 1 1.2 20 80.3 0.15 4
## 3 204 A R 2 0.9 20 81.1 0 42
## 4 214 A R 2 4.05 50 85.1 0.07 25
## 5 305 A R 3 1.1 15 84.8 0 29
## 6 316 A R 3 1.8 10 93.7 0 24
## 7 406 A R 4 1.4 25 84.5 0.06 37
## 8 412 A R 4 0.45 10 84.8 0.07 42
## 9 108 A S 1 13.3 55 92.3 0.42 NA
## 10 110 A S 1 12.3 50 103. 0.26 NA
## # ℹ 54 more rows
Note: You can inspect the data more closely by clicking on it in the Global Environment. This panel is typically in the upper right corner.
Every ggplot2 plot starts with the function,
ggplot(). This initializes the plot, and is where you will
define the data you want used to create the plot. In this case, the data
we want to use is in the data_demo object we just loaded in. Let’s
investigate how ggplot() works without any input:
ggplot()
Note: when you do not inform ggplot2 of
what data you want to plot, the function ggplot() will only
print a gray panel. It has initialized the plot, but doesn’t have any
data to map yet.
This is where data and aesthetics enter the
discussion as arguments within the ggplot() function. The
data argument is pretty self-explanatory, but the arguments within
aes() determines the appearance of the plot. In this case,
it determines the variables we are going to map and which axis they will
belong to.
ggplot(data = data_demo, aes(x = trt, y = yld)) # Remember, aes is short for aesthetics
So far, we have only told ggplot() that our graphic will
have information about treatments values in the x-axis and the variable
yield on the y-axis. However, there are plenty of plots that could be
created with quantitative and qualitative variables, such as box plots
or line plots.
At this point, we will indicate to R exactly what type of plot we want to create. This is where information about the geometric elements is considered.
geom_*()In this step, we indicate to R the specific type of graphic we are
going to create (line plots, point charts, histograms, etc.,) with the
material used or defined in in ggplot(). We do this by
adding a geometric component with one of many geom_*()
functions.
Below is an example plotting the points (or raw data) from data_demo
as a point graph. We do this by adding a new layer, with
geom_point(), to tell ggplot to show us a
point graph.
# Note that we use the '+' sign to connect the second line of code to the first.
ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_point()
We’ve successfully created a point graph using a
geom_*() function. But it could be presented in a better
manner. We can use a different geom_*() function to assign
a different plot type. In this case, geom_jitter() adds a
small amount of random variation to the location of each data
observation.
ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_jitter()
We can see the individual data points more easily, but it’s hard to
tell which treatments they come from. We can use argument ‘width’ within
the geom_jitter() function to reduce the amount of
dispersion along the x-axis.
# By default, the 'width' value of 'geom_jitter()' is 0.4. Here we halve the variation applied with 'geom_jitter()' by specifying 'width' to 0.2.
ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_jitter(width = .2)
Note: all geom_*() functions (geom_jitter(), geom_bar(),
etc.,) have additional arguments that can be added or modified,
depending on the need. To find out what arguments are available or what
default values are, try ?geom_*() to pull up information
about the different geom_*() functions in RStudio. You can
also use a data
visualization cheat sheet. More translations of this cheat sheet can
be found here.
Examples of geom_* functions
geom_*()While point graphs may illustrate the dispersion in a data set well
enough, they are not the only way to represent the data. We will use
different geom_*() functions to illustrate some of the
different ways to visualize the data. Extension packages may have even
more geom_*() functions that will allow you to create even
more types of plots.
Boxplot (geom_boxplot())
ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_boxplot()
Violin plot (geom_violin())
ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_violin()
Dot plot (geom_dotplot())
ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_dotplot(binaxis = "y", stackdir = "down")
## Bin width defaults to 1/30 of the range of the data. Pick better value with
## `binwidth`.
# 'binaxis' argument tells R how to group the data. 'stackdir' determines how to distribute the dots along the x-axis.
We can think about using geom_*() functions as creating
layers for our plot. We can stack these functions to improve the quality
of the plot, provide more information about the data, and to compare
different factors of interest.
# Layer a box plot with a point chart:
ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_boxplot()+
geom_jitter(width = .2, size = 4)
These geom_*() features will ‘layer’ on top of each
other in the order they’re added, meaning that the last
geom_*() function used will be laid over the preceding
ones. There is technically no limit to the number of layers (geom_*)
that you can use in a plot, but using too many can make your plots
confusing.
# Applying multiple layers (box plot, point chart, violin plot)
ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_boxplot()+
geom_jitter(width = .2, size = 4)+
geom_violin() # Note that the violin plot is the last layer to be added, meaning it will be applied over the other features.
Sometimes reordering the geom_*() layers can improve the
visibility of a graphic, along with adding arguments to resize
features.
# Multiple layers, different order:
ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_violin()+
geom_boxplot(width = 0.3)+ # using the argument width we can adjust the box size
geom_jitter(width = .2, size = 4)
Now we can learn to add aesthetics like color, different shapes and
size, and more to our plots with aes() arguments.
The next component in ggplot2 plots is aesthetics, which
is how our observations will be mapped in the plot. So far, we have used
aesthetics to define x and y-axis positions with aes()
arguments within the base function ggplot().
Aesthetics can refer to multiple characteristics. The most common characteristics changed with aesthetics are:
Within a given plot, there are multiple items that we can assign
colors to. The arguments color (or colour),
fill, and alpha enable you to assign color to
different elements of plots. color assigns color to points
or lines. fill assigns color to enclosed spaces (within
box, polygons, circles, etc.,). alpha allows you to control
the transparency of colors within these elements.
We can add the color argument to our base function,
ggplot() to assign colors to different treatments.
ggplot(data = data_demo, aes(x = trt, y = yld, color = trt)) +
geom_boxplot()+
geom_jitter(width = 0.2, size = 4)
What if we only want color applied to the points, and not to the box plot component?
ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_boxplot()+
geom_jitter(aes(color = trt), width = 0.2, size = 4)
Compare this to the last plot. When we assign the color
argument to a geom layer instead of to the base plot function,
the colors are only applied to that layer.
We can apply color only to the box plot components of the graph, and
not to the points by applying a color argument to the
geom_boxplot() layer. Instead of using color,
we use fill to color the area within the box.
ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_boxplot(aes(fill = trt))+
geom_jitter(width = 0.2, size = 4)
We can use alpha to shade points according to their
‘yld’ variable:
ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_boxplot(size=2)+
geom_jitter(aes(alpha = yld), # Here we are using alpha to show a yield gradient on the points
width = 0.2, size = 4) # We increase the point size to see the differences better
If we use these arguments (color, fill, and alpha) outside of the aesthetic function, they change the whole element. Instead of using color to distinguish between different treatments or values, we color all of a certain element. For instance, the box plots can all be filled in green, the violin plots can all be outlined in red, and all data points can be a transparent black-grey.
ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_violin(color = "red")+ # border color equal to red
geom_boxplot(width = 0.3, fill = "green")+ # fill equal to green
geom_jitter(alpha = 0.4, # 60% of transparency
width = 0.2, size = 4)
We will learn even more about assigning colors to different plot
elements when we look at the scale component of ggplots. For now, we can
move on to other aes() arguments. We will also learn how to
reference many different colors and shades, like "green",
"lightgreen", and "olivegreen" when we learn
about using scale components to modify plot colors.
The size aesthetic allows us to change the size of
certain elements. See how we can size data points in accordance to their
severity values:
ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_boxplot()+
geom_jitter(aes(size = sev), #The size of the point changes depending on the severity value
width = 0.2)
We can size other geom_*() elements to change the
appearance of our plots, without adding any visual information.
ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_boxplot(size = 4)
We will learn more about sizing plot elements when we learn about scale components. For now, we can move into using aesthetics components to determine shape.
There are many shapes included in ggplot2(), with the
most common shown below. The numbers assigned to these shapes make it
easy to ‘call’ them when building plots.
Different shape options
Notice how shapes 21- 25 can have an outline color and fill,
while the others only have color.
We can apply shapes to certain elements of our plots with the
shape argument within aes(). Here, we set
shape to distinguish between different treatments, with R choosing to
use shapes 0-4.
ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_boxplot()+
geom_jitter(aes(shape = trt),
width = 0.2, size = 4)
We can also use the shape argument to assign specific
shapes to all points by taking it out of the aes()
argument:
ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_boxplot() +
geom_jitter(shape = 9)
We will learn more about setting specific symbols for specific variables and elements when we learn about scale components.
There are several line types available in R that can be viewed below. These line types can also be referenced easily by their numbers.
Built-in line types
We can use the linetype argument to assign different
types of lines to different groups of data. Here we use
linetype aesthetics to distinguish between two groups in
the data, R (resistant) and S (susceptible).
Notice that we use a new geom_*() function and it’s
aes() arguments to create a line graph.
# We group the data by treatment and variety, and summarize the mean severity
data_line <- data_demo %>%
group_by(trt, var) %>%
summarise(sev = mean(sev))
## `summarise()` has grouped output by 'trt'. You can override using the `.groups`
## argument.
data_line
## # A tibble: 8 × 3
## # Groups: trt [4]
## trt var sev
## <chr> <chr> <dbl>
## 1 A R 1.61
## 2 A S 6.28
## 3 B R 1.7
## 4 B S 1.58
## 5 C R 0.475
## 6 C S 0.669
## 7 D R 0.431
## 8 D S 0.444
# Then we create a plot with different line types for treatments where var=R or var=S. Notice that we use the data structure 'data_line', which we created above.
ggplot(data_line) +
geom_line(aes(x = trt,
y = sev,
linetype = var, # Setting line type to distinguish between R and S varieties
group = var), # Telling R that it should group the data by variety
linewidth = 1) # Setting the thickness of the lines
We can also use linetype to set the type of line used
for all data by taking it out of aes().
ggplot(data = data_line, aes(x = sev, y = yld)) +
geom_line(aes(x = trt,
y = sev,
color = var, # Distinguish between R and S by color of line, rather than line type
group = var),
size = 1,
linetype = "dashed")
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
# Using 'linetype = 2' produces the same result as 'linetype = "dashed"'
As with the other aesthetic elements, we will learn more about setting line types when we learn about scale components.
aes()
argumentsSo far, we have built plots using the base function
ggplot() and layering on geom_*() elements,
and aes() arguments to determine plot type and style the
plot. We can further customize our plots with scale components.
Scale components control how aesthetics are mapped. For each aesthetic we’ve learned about, there is a corresponding scale function. Just as in the aesthetics section, we can start by looking at how scale elements can be used to modify axes.
Let’s say we want to create a dot chart and box plot representation of our data, with treatment (which is categorical) as the x-axis, and yield (quantitative) as our y-axis. We could just do this:
ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_boxplot() +
geom_jitter(aes(fill = trt),
shape = 21,
width = 0.2,
size = 4,
show.legend = FALSE)
R and ggplot2 do a good job of setting the range and
tick-mark location for the y-axis. But, if needed, we can further modify
the range and tick-marks of both x and y-axes with functions
scale_x_continuous() and
scale_y_continuous().
Let’s say you want more room to put labels or images on this plot for
your poster, presentation, or manuscript. You also will want to label
the y-axis. We can do both of these things for the y-axis with
scale_y_continuous:
ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_boxplot() +
geom_jitter(aes(fill = trt),
shape = 21,
width = 0.2,
size = 4,
show.legend = FALSE) +
scale_y_continuous(name = "Yield", # Set the y-axis label to "Yield"
limits = c(70,130)) # Expand the lower and upper limits of the y-axis to 70 and 130
This is good, but imagine you want the tick-marks to occur at more frequent intervals:
ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_boxplot() +
geom_jitter(aes(fill = trt),
shape = 21,
width = 0.2,
size = 4,
show.legend = FALSE) +
scale_y_continuous(name = "Yield", # Set the y-axis label to "Yield"
limits = c(70,130), # Expand the lower and upper limits of the y-axis to 70 and 130
breaks = c(seq(70,130,10))) # Tick marks occur at every 10th value between 70 and 130
Remember, these same operations can be applied to the x-axis with
scale_x_continuous. It doesn’t make sense to do this on
categorical data like treatment, but it can be helpful if you have a
quantitative x variable. You could create a plot to visualize the
relationship between two continuous variables, severity and yield:
ggplot(data = data_demo, aes(x = sev, y = yld)) +
geom_jitter(aes(fill=trt),
shape = 21,
width = 0.2,
size = 4,
show.legend = FALSE) +
scale_y_continuous(name = "Yield",
limits = c(70,130),
breaks = c(seq(70,130,10))) +
scale_x_continuous(name = "Severity",
limits = c(0,15))
## Warning: Removed 7 rows containing missing values (`geom_point()`).
We can also use scale_y_continuous to add a third axis
to our plot. Image we want to represent our yield values in two
different units. We can plot one unit along the y-axis, and use the
sec.axis argument of the scale_y_continuous
function to create an axis with the other set of units on the
opposing side of the plot:
ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_boxplot()+
geom_jitter(aes(fill = trt), shape = 21, width = 0.2, size = 4, show.legend = FALSE)+
scale_y_continuous(name = "Yield (bu/ac)",
limits = c(70, 130),
breaks = c(seq(70,130,10)),
# Here we add a second axis
sec.axis = sec_axis(trans = ~ . *0.0672, # Transform yield from bu/ac to ton/ha
name = "Yield (ton/ha)", # New axis name
breaks = c(seq(4.5,9,0.5)))) # Axis breaks
Tip: You can use sec.axis = dup_axis()) to simply
duplicate the y-axis to the other side of the plot if you want the same
values on each side for easier reference.
Just as with axes defined by continuous variables, we can use scale components to modify axes with discrete variables. With either scale_x_discrete() or scale_y_discrete() we can rearrange and exclude levels/values of discrete variables.
We will use treatment(‘trt’) again as the x-axis to show how we can modify axes of discrete variables:
ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_boxplot()+
geom_jitter(aes(fill = trt), shape = 21, width = 0.2, size = 4, show.legend = FALSE) +
# We use scale_x_discrete:
scale_x_discrete(
limits = c("B", "A", "D")) # Exclude trt "C"
## Warning: Removed 16 rows containing missing values (`stat_boxplot()`).
## Warning: Removed 16 rows containing missing values (`geom_point()`).
Note: R will give a warning message letting us know that there are
missing values. This is because using
limits = c("B", "A", "D")) transformed all “C” treatments
to NA’s.
We can also re-label the treatments along the x-axis using the
labels argument within scale_x_discrete:
ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_boxplot()+
geom_jitter(aes(fill = trt), shape = 21, width = 0.2, size = 4, show.legend = FALSE)+
scale_x_discrete(
labels = c('B' = 'New Label', # Re-label trt "B" as "New Label 1"
'D' = 'New Label \n in two lines')) # Trt "D" becomes "New Label 2 in 2 lines"
Tip: You can use “” to break the label into multiple lines. Just insert the “” where you want the break.
Just as with color aesthetic elements, we can modify ‘color’, ‘fill’,
and ‘alpha’ to better represent our data. We can use scale elements in
ggplot2 to represent categorical and quantitative data with
a range of color and transparencies. Similarly to the color aesthetics
section, color and fill arguments are very
similar. To avoid redundancy, we will use fill only. Keep
in mind, however, that you can use color in the same
way.
Colors in R may be referenced or accessed in several different ways.
In base R, colors have names like those shown in the next image. You can
see a list of colors using colors(). These colors are also
defined by the RGB code, an additive combination of
Red, Green, and Blue.
Users can also reference these colors using their ‘hex’ code
(hexadecimal code), which is constructed of a 6 character combination of
letters A-F and numbers 0-9. For example, the same color can be
referenced with "yellow" and with "FFFF00".
Additionally, the hex code can be extended to an 8 character code, with
the last two characters representing color transparency. For example,
"FFFF0050" would be yellow at 50% transparency.
Some R colors, hex code formatting, and RGB scheme
Selecting colors for a plot may sound trivial, but it can influence how interpretable your figures are and is often challenging. There are many things to consider when designing or choosing a color scheme/pallet. We need to consider whether viewers will be able to distinguish between different hues, especially for colors commonly confused with color-blindness such as red and green. These considerations and others constitute good practices for data visualization. Although this is not a part of this course, we highly recommend that you look for more information on these topics to maximize the impact of your figures.
Fortunately, there are plenty of resources, including R packages,
from which you can choose color pallets. The most popular of these is
the ‘ColorBrewer’ project designed by Penn State professor Dr. Cynthia A. Brewer. This
project has both an interactive website
for choosing pallets and an R package rcolorbrewer.
The website has helpful suggestions for pallets of sequential,
diverging, and qualitative data, as well as pallet options for color
blindness and printing.
Colorbrewer2.org; 1 - number of classes; 2 - type of data; 3 - color pallet options; 4 - printing/fill options; 5 - details on colors used
The Color Brewer project was so popular that functions from the
package scale_color_brewer() and
scale_fill_brewer() became native (included) to
ggplot2. rcolorbrewer also contains some
pre-defined color pallets that can be viewed with
display.brewer.all().
This is an example of how a
ggplot/rcolorbrewer function and pre-defined
pallet can be applied to add a color ‘layer’ to our plot:
ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_boxplot()+
geom_jitter(aes(fill = trt),
shape = 21, width = 0.2, size = 4)+
scale_fill_brewer(palette = "Paired") # R automatically identifies the number of 'classes' (treatments, in this case) and applies pallet colors accordingly
Other developers have created packages for color pallets, such as
package ggsci.
In ggsci, pallets were created using colors from a group of
scientific journals.
Here we provide an example of ggsci function
scale_fill_npg() to apply colors to our plot.
library(ggsci)
ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_boxplot()+
geom_jitter(aes(fill = trt),
shape = 21, width = 0.2, size = 4)+
scale_fill_npg() # Just like rcolorbrewer, the number of classes are automatically determined and colors from the pallet are applied
We can use a gradient of color to represent quantitative values.
There are three major color scale functions:
scale_fill_gradient(), scale_fill_gradient2(),
and scale_fill_gradientn(). These functions can be used to
apply a different number of colors to quantitative data. Alpha values,
which determine the transparency of the color, can also be scaled for
quantitative variables with scale_alpha_*() functions.
The first fill function, scale_fill_gradient(), can be
used to create a gradient from two colors to represent low and high
values. Here, we create a plot where bluer points indicate low FDK
values and redder values high. Data points with NA values are indicated
in black.
ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_boxplot()+
geom_jitter(aes(fill = fdk), # This tells R to color according to fdk variable
shape = 21, width = 0.2, size = 4)+
scale_fill_gradient(low = "blue",
high = "red",
na.value = "black")
The second function, scale_fill_gradient2() can be used
to create a gradient with three colors, representing low, intermediate,
and high values. Here, we add the use of the color yellow to represent
intermediate values:
ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_boxplot()+
geom_jitter(aes(fill = fdk),
shape = 21, width = 0.2, size = 4)+
scale_fill_gradient2(low = "blue",
mid = "yellow",
high = "red",
midpoint = 21, # Define the middle point value. Default is 0
na.value = "black")
Note that ggplot will distribute the color symmetrically from the middle point. Depending on where you define the middle point, true red or blue values may not be reached. In this case, the distribution of values is represented well with 21 as the mid point.
Function scale_fill_gradientn() can be used to create a
gradient from any number (n) of colors. Here we apply a rainbow of
colors to represent FDK values.
ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_boxplot()+
geom_jitter(aes(fill = fdk),
shape = 21, width = 0.2, size = 4)+
scale_fill_gradientn(colours = c("#FF0000", "#FFFF00", "#00FF00",
"#00FFFF", "#0000FF", "#FF00FF"))
Earlier, we used geom_jitter(aes(alpha = fdk)) to apply
alpha transparency values to a plot.
# Without specifying the scale, R will assume the default values
# therefore, here we would have the same results as if we have
# scale_alpha_continuous(range = c(0.1,1))
ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_boxplot()+
geom_jitter(aes(alpha = fdk), # Setting alpha values
width = 0.2,
size = 4)
By default, the range of transparency values applied to the above
plot range from 0.1 to 1.0. With function
scale_alpha_continuous(), we can exert more control over
the range of transparency values applied to the plot. Here, we adjust
the range of color transparency with the range argument to
produce a plot with a more constrained set of shades.
ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_boxplot()+
geom_jitter(aes(alpha = fdk),
width = 0.2, size = 4)+
scale_alpha_continuous(range = c(0.3, 0.6)) # Set transparency values at 0.3 (30%) to 0.6 (60%) of shade
Setting range values can improve the visibility of
certain data points. Smaller FDK values became more visible against the
light background of the plot.
Note that there are two observations in treatment A where the color transparency did not change. These two observations represent points without values (NAs) for variable FDK. There are a few some strategies to deal with this problem.
The first solution could be to remove those two observations with missing values. This would eliminate the two data points on the graph, but the box plots would also be affected.
A better option is to represent these NA observations with a
different color or shape. To achieve this, we can take advantage of the
layer design of ggplot2 and the na.value
argument of scale_alpha_continuous. We can use
na.value = 0 to set the transparency of any NA values to
0%, which is invisible. Then, we can use a second
geom_jitter() layer to add the problematic observations
back in, represented by a different shape with shape = 8.
By not including the alpha argument in the new
geom_jitter() layer, we make it so that points from this
new layer are not affected by na.value = 0.
ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_boxplot()+
geom_jitter(aes(alpha = fdk),
width = 0.2, size = 4)+
geom_jitter(data = filter(data_demo, is.na(fdk)), # Add 2nd geom_jitter layer with only NA observations
aes(x = trt, y = yld),
shape = 8, # Define a different shape for the NA observations
width = 0.2, height = 0, size = 4)+
scale_alpha_continuous(range = c(0.4, 1),
na.value = 0) # Set transparency of observations where FDK is NA to 0. It will only be applied to geom layers where alpha is defined.
This is an example of how we can creatively use the layered design of
ggplot2 to create customized plots. This kind of technique
can be applied to other situations, such as when there is a need to
emphasize a particular observation in the data.
Tip: Notice that applying a color-related scale component to a plot
where no color, fill, or alpha
arguments are defined doesn’t work to apply any change in color or
transparency:
ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_boxplot()+
geom_jitter(aes(size = fdk), # Instead of 'alpha', we used 'size'
width = 0.2, size = 4)+
geom_jitter(data = filter(data_demo, is.na(fdk)),
aes(x = trt, y = yld),
shape = 8,
width = 0.2, height = 0, size = 4)+
scale_alpha_continuous(range = c(0.4, 1),
na.value = 0) # There is no alpha feature for this scale to be applied to
Remember that we build up plots with these components. In
order to scale features like colors, axes, shapes, etc., those features
need to be present in another layer of the plot, such as those
constructed by functions ggplot() or
geom_*().
Scaling for size is performed similarly to alpha scaling. We can use
scale_size_continuous() to apply a range of sizes to data
according to their respective values. Arguments range and
trans allow users to set the range of sizes and reverse the
order in which they are applied.
Here we apply scale_size_continuous() and
range to assign size to observations based on the value of
variable FDK:
ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_boxplot()+
geom_jitter(aes(size = fdk),
width = 0.2)+
scale_size_continuous(range = c(1, 6)) # These are actually the default values, but you can adjust them to better emphasize larger values
## Warning: Removed 2 rows containing missing values (`geom_point()`).
Note that ggplot2 provides us with a warning message
indicate that two values were removed. Color scaling functions like
scale_alpha_continuous() and
scale_fill_continuous() simply assigned the NA observations
a color or transparency not used in the range of the rest of the data.
In contrast, scale_size_continuous() removed these
values.
Imagine the difficulties associated with automatically assigning NA
observations a different point size: Assigning a size larger
than the 6 used for FDK values of 40 could obscure other data points,
and a smaller size would be difficult to distinguish from
points where FDK is 0, or it would be hardly visible. This is an example
showing the importance of understanding how different
ggplot2 functions will deal with NAs, and influences how we
can replace those observations in the plot.
Just like in the last example of
scale_alpha_continuous(), we will show how to use the
layered design of ggplot2, along with our understanding of
how NA values are handled by scale_size_continuous to
visualize the data effectively. For fun, we will also use the
trans argument to reverse the sizing - small values will be
represented by large points, and large values by small points.
ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_boxplot() +
geom_jitter(aes(size = fdk),
width = 0.2) +
geom_jitter(data = filter(data_demo, is.na(fdk)), # Add 2nd geom layer with NA values
aes(x = trt, y = yld),
shape = 8,
width = 0.2, height = 0, size = 4) +
scale_size_continuous(range = c(0.25, 5), # Modify range to better visualize the distribution of FDK values
trans = "reverse") # Reverse the default order of sizing
## Warning: Removed 2 rows containing missing values (`geom_point()`).
We get the same warning message, but our NA values are still
represented in the second geom_jitter() layer.
Compared to some of the other scale components of
ggplot2, scaling the shape of data points is limited. It is
inappropriate to use shape_scale_manual() to represent
continuous values because, unlike shades of color or size, point shapes
are discrete units. However, we can still use
shape_scale_manual() to more specifically assign shapes to
represent categories of observations within our data.
With shape aesthetic components, we were able to tell
ggplot() to use shapes to indicate different groups of
data. We were also able to tell ggplot() to use a specific
symbol for all point data within the plot. Using
shape_scale_manual() allows us to assign specific
shapes to specific groups.
Now we assign shapes 8 (asterisk) and 15 (solid square) to the varieties variable to better illustrate which observations are genetically resistant (R) and which are genetically susceptible (S).
ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_boxplot()+
geom_jitter(aes(shape = var), # We will use the factor, variety, to define the different shapes
width = 0.2, size = 4)+
scale_shape_manual(values = c(8, 15)) # Here we defined two different shapes depending on the variety
Just like with shape, scaling by line type doesn’t work for
continuous variables. However, we can use
scale_linetype_manual() to assign different line types to
different categorical variables.
We’ll construct a previous plot without
scale_linetype_manual() to compare against:
# We group the data by treatment and variety, and summarize the mean severity
data_line <- data_demo %>%
group_by(trt, var) %>%
summarise(sev = mean(sev))
## `summarise()` has grouped output by 'trt'. You can override using the `.groups`
## argument.
data_line
## # A tibble: 8 × 3
## # Groups: trt [4]
## trt var sev
## <chr> <chr> <dbl>
## 1 A R 1.61
## 2 A S 6.28
## 3 B R 1.7
## 4 B S 1.58
## 5 C R 0.475
## 6 C S 0.669
## 7 D R 0.431
## 8 D S 0.444
# Then we create a plot with different line types for treatments where var=R or var=S. Notice that we use the data structure 'data_line', which we created above.
ggplot(data_line) +
geom_line(aes(x = trt,
y = sev,
linetype = var, # Setting line type to distinguish between R and S varieties
group = var), # Telling R that it should group the data by variety
linewidth = 1) # Setting the thickness of the lines
We’ve successfully created a plot where varieties R and S are
represented by different line types. But if we want specific line types
applied to these varieties, we need to use
scale_linetype_manual() to assign the line types we’d like
to use. Remember that different types of lines in ggplot2
can be referenced by their names or by their numbers.
We create a plot identical to the one above except with
scale_linetype_manual() defining different line types to
use:
ggplot(data_line) +
geom_line(aes(x = trt,
y = sev,
linetype = var,
group = var),
linewidth = 1) +
scale_linetype_manual(values = c("twodash","dashed"))
So far, we’ve seen that scale_*() functions can be used
to more specifically control plot aesthetics. Now we can move into more
advanced functions that will allow us to create more complex
figures.
facet_*() functions allow us to arrange multiple plots
in a side-by-side or grid-like manner, based on a specified variable. In
our previous examples, we used shape to differentiate some of our
variables, like ‘variety’ and ‘treatment’. Now, we will use
facet_*() functions to visualize our data with more
variables represented.
Here, we create a faceted plot with facet_wrap() to
represent yield by treatment type and variety:
ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_boxplot()+
geom_jitter(aes(fill = trt), shape = 21, width = 0.2, size = 4)+
scale_fill_npg() +
# Here we tell R to facet the plot by variety:
facet_wrap(~var)
Our original plot is now split in two. One side shows yield values of
the resistant variety (R) by treatment type, while the other shows the
same information for the susceptible variety (S). Notice that the y-axis
is the same between these two plots. Using facet_wrap()
resulted in two standardized plots that can be directly compared.
We can see that yield values are distinct between R and S varieties,
resulting in more constrained boxes in the geom_boxplot()
layer. This results in some values being defined as outliers. These
outliers are represented by both a colored point and a black dot in the
geom_jitter() and geom_boxplot() layers,
respectively. Obviously, we don’t want some treatments represented twice
in our plots, but we also want to show the correct variation within the
box plots.
To do this, we use the outlier.alpha argument within
geom_boxplot() to set the transparency of the black dots to
0, which is invisible. This way, the outliers will be expressed within
the geom_jitter() layer. Their values will be included when
calculating box plots, but won’t be visibly represented.
ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_boxplot(outlier.alpha = 0) + # Make outliers completely transparent
geom_jitter(aes(fill = trt), shape = 21, width = 0.2, size = 4)+
scale_fill_npg() +
facet_wrap(~var)
You can see that the black dots representing outliers in the box plot layer are gone. The outlier data points are no longer represented twice within the plots.
If we wanted to use different y-axis values for each of these two
plots, we can use the scales argument within
facet_wrap(). We can also use the labeller
argument to modify axis labels and the ‘strip’ text (the text above each
plot).
ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_boxplot(outlier.alpha = 0) +
geom_jitter(aes(fill = trt), shape = 21, width = 0.2, size = 4)+
scale_fill_npg() +
facet_wrap(~var,
scales = "free_y", # Y-axis scales are assigned independently between facets
labeller = labeller(
var = c("R" = "Resistant",
"S" = "Susceptible"))) # Replace 'R' with 'Resistant' and 'S' with 'Susceptible'
The default value for scales is ‘fixed’, which makes
standardized plots. Other options for the scales argument
include ‘free’ (for both axes) and ‘free_x’ (for x-axis).
We can also use the ncol and nrow arguments
within facet_wrap() to create stacked plots, rather than
side-by-side plots:
ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_boxplot(outlier.alpha = 0) +
geom_jitter(aes(fill = trt), shape = 21, width = 0.2, size = 4)+
scale_fill_npg() +
facet_wrap(~var, labeller = labeller(var = c("R" = "Resistant", "S" = "Susceptible")),
ncol = 1, nrow = 2) # Arrange plots in 1 column and 2 rows
You can create a grid of plots with one variable represented
horizontally and the other vertically with facet_grid(). We
can still represent variety (resistant vs. susceptible) with columns of
plots while also representing disease incidence levels in rows.
Currently, disease incidence level (‘inc’) is a continuous variable,
which would be hard to represent within rows. However, we can transform
it into a categorical variable by splitting the values into ‘low
incidence’ (inc < 10) and ‘high incidence’ (inc > 10). We do this
with mutate() an if_else() function
statement.
data_grid <- data_demo %>%
mutate(inc_cat = if_else(inc <=10, "Low Incidence", "High Incidence")) # If the statement 'inc <= 10' is TRUE, use mutate() to transform new variable 'inc_cat' into 'Low Incidence'. If the statement is FALSE, transform the value to 'High Incidence'
data_grid
## # A tibble: 64 × 10
## plot trt var blk sev inc yld don fdk inc_cat
## <dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 107 A R 1 1.95 35 86.6 0.11 4 High Incidence
## 2 109 A R 1 1.2 20 80.3 0.15 4 High Incidence
## 3 204 A R 2 0.9 20 81.1 0 42 High Incidence
## 4 214 A R 2 4.05 50 85.1 0.07 25 High Incidence
## 5 305 A R 3 1.1 15 84.8 0 29 High Incidence
## 6 316 A R 3 1.8 10 93.7 0 24 Low Incidence
## 7 406 A R 4 1.4 25 84.5 0.06 37 High Incidence
## 8 412 A R 4 0.45 10 84.8 0.07 42 Low Incidence
## 9 108 A S 1 13.3 55 92.3 0.42 NA High Incidence
## 10 110 A S 1 12.3 50 103. 0.26 NA High Incidence
## # ℹ 54 more rows
# New variable 'inc_cat' is categorical
ggplot(data = data_grid, aes(x = trt, y = yld)) +
geom_boxplot(outlier.alpha = 0)+
geom_jitter(aes(fill = trt), shape = 21, width = 0.2, size = 4)+
scale_fill_npg() +
facet_grid(inc_cat~var, # Create grid with variables 'inc_cat' and 'var' with 'inc_cat' as the rows and 'var' as the columns (row variable ~ column variable)
labeller = labeller(var = c("R" = "Resistant", "S" = "Susceptible")))
We can also use facet_grid() in a similar way to
facet_wrap() to arrange plot facets by a single
variable.
ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_boxplot(outlier.alpha = 0)+
geom_jitter(aes(fill = trt), shape = 21, width = 0.2, size = 4)+
scale_fill_npg() +
facet_grid(.~var) # Use 'var' as column in the grid. The '.' is used as a place-holder for the row variable.
ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_boxplot(outlier.alpha = 0) +
geom_jitter(aes(fill = trt), shape = 21, width = 0.2, size = 4)+
scale_fill_npg() +
facet_grid(var~.) # Use 'var' as a row.
With this approach, we can achieve the same results as when we used
the ncol and nrow arguments with
`facet_wrap().
We’ve already seen some ways to modify the labels of our plots.
However, using the labs() function to add a label component
to our plot offers more control and customization.
We can use labs() to add elements like titles,
subtitles, captions, etc., as well as to add more descriptive axis
labels.
ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_boxplot(outlier.alpha = 0)+
geom_jitter(aes(fill = trt), shape = 21, width = 0.2, size = 4)+
scale_fill_npg() +
labs(
title = "Here is a title",
subtitle = "Here is the plot subtitle",
caption = "This is figure caption",
tag = "This is a figure reference label",
y = "y-axis title",
x = "x-axis title",
fill = "fill title"
)
We can further customize these elements with theme components.
Theme components can be used to modify non-data components of the
plot, such as title location, background color, the use of grid lines,
and more. theme_*() functions are used to add these
component.
ggplot2 contains 10 pre-set themes and has options to
add further customization. Other packages like ggthemes()
offer more theme pre-sets. Themes can be useful when showing plots in
different settings, such as with projectors, on colored paper, or in
journals with figure specifications.
ggplot2 has 10 built-in themes that can be quickly
applied to plots with individual functions. For example, theme ‘bw’ is
applied with theme_bw().
ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_boxplot(outlier.alpha = 0) +
geom_jitter(aes(fill = trt), shape = 21, width = 0.2, size = 4) +
scale_fill_npg() +
labs(y = "Yield (bu/ac)", x = "Treatments", fill = "Treatments") +
theme_bw() # Use theme 'bw'
Tip: The ‘bw’ in theme ‘bw’ stands for black on white. This high-contrast theme is useful when projecting images.
The same plot can be constructed with another pre-set theme, such as theme ‘minimal’:
ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_boxplot(outlier.alpha = 0) +
geom_jitter(aes(fill = trt), shape = 21, width = 0.2, size = 4) +
scale_fill_npg() +
labs(y = "Yield (bu/ac)", x = "Treatments", fill = "Treatments") +
theme_minimal() # Use theme 'minimal'
Examples of all 10 of the pre-set themes of ggplot2 can
be viewed here.
# Load in ggthemes
library("ggthemes")
# Use ggthemes
ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_boxplot(outlier.alpha = 0) +
geom_jitter(aes(fill = trt), shape = 21, width = 0.2, size = 4) +
scale_fill_npg() +
labs(y = "Yield (bu/ac)", x = "Treatments", fill = "Treatments") +
theme_economist() # Use theme 'economist' from ggthemes
We can create and apply custom themes as well. ggplot2
offers many different options, most of which we won’t be able to cover.
Documentation for all the options in theme() can be found
here.
We’ll construct a plain plot (using the default theme) for comparison:
ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_boxplot(outlier.alpha = 0)+
geom_jitter(aes(fill = trt), shape = 21, width = 0.2, size = 4)+
scale_fill_npg() +
facet_grid(~var, labeller = labeller(var = c("R" = "Resistant", "S" = "Susceptible")))+
labs(y = "Yield (bu/ac)", x = "Treatments", fill = "Treatments") +
theme() # Leaving theme() empty results in the default theme
We can set colors for different areas (‘panels’) with a variety of
panel.* arguments. Axes and text objects can be similarly
modified with axis.text.* and title arguments.
We can modify the legend with the predictable legend.*
group of arguments. A variety of element_*() functions can
be used within these arguments to define colors, sizes, and more.
Here, we set colors for each panel of the plot, the y-axis, the title, and the legend. We also hide the x-axis, set the size for the title, and move the legend to a new position. We also reformat the strip texts (where ‘Resistant’ and ‘Susceptible’ are labelled).
ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_boxplot(outlier.alpha = 0) +
geom_jitter(aes(fill = trt), shape = 21, width = 0.2, size = 4) +
scale_fill_npg() +
facet_grid(~var, labeller = labeller(var = c("R" = "Resistant", "S" = "Susceptible"))) + # This is our strip text
labs(y = "Yield (bu/ac)", fill = "Treatments",
x = NULL) +
# Here is where enact our modifications:
theme(
panel.background = element_blank(), # Set background to white
panel.grid.major = element_line(colour = "grey88"), # Change the major grid lines to a shade of grey
panel.border = element_rect(colour = "grey80", fill = NA), # Add a border in a shade of grey. Don't fill the space with color
axis.text.y = element_text(colour = "black"), # Print the y-axis label in black
axis.text.x = element_blank(), # Make the x-axis label a blank space
title = element_text(colour = "black", size = 12, face = "bold"), # Set the title to black, size 12, and in bold font
legend.position = c(0.90, .20), # Move the legend to these coordinates relative to the image
legend.key = element_rect(fill = "white"), # Fill the legend background with white
legend.text = element_text(colour = "black", size = 12), # Make legend text black and size 12
# Strip text justified almost completely to the left - (hjust = 0.01)
# hjust = 0.5 is center justification, and hjust = 1 is right justification
strip.text.x = element_text(hjust =0.01, face = "bold", size = 14),
# Move strip text to the left border (hjust = 0.01) and make the text bold and size 14.
# Tip: 'hjust = 1' moves the object to the right border
strip.background = element_blank()) # Change the background behind strip text from grey to blank (white)
Frequently, we need to combine multiple plots into a single one. While many people copy and paste their plots in a MS Power Point (or similar software) and save as new plot, this approach is very inefficient, compromises quality, and may cause unintentional errors.
A better approach is combine these plots in single one and save.
There are a few good packages that can help with this task, including,
gridExtra,
cowplot,
and ggpubr.
We will also explore a new package, and arguably an easier one to use,
patchwork.
First, let’s build some plots to be combined using
patchwork. We will build four plots, considering the effect
of treatment on the following variables: yield, severity, incidence, and
FDK.
Now we will create 4 individual plots using custom formatting. We won’t be visualizing them to save space on the page.
# Create (but not print) yield data plot:
plot_yield <-
ggplot(data = data_demo, aes(x = trt, y = yld)) +
geom_boxplot(outlier.alpha = 0)+
geom_jitter(aes(fill = trt), shape = 21, width = 0.2, size = 4)+
scale_fill_npg() +
labs(title = "Yield data", y = "Yield (bu/ac)", x = NULL)+
theme(
axis.text = element_text(colour = "black"),
title = element_text(colour = "black", size = 12, face = "bold"),
legend.position = "none")
# Severity plot:
plot_sev <-
ggplot(data = data_demo, aes(x = trt, y = sev)) +
geom_boxplot(outlier.alpha = 0)+
geom_jitter(aes(fill = trt), shape = 21, width = 0.2, size = 4)+
scale_fill_npg() +
labs(title = "Severity data", y = "Severity (%)", x = NULL)+
theme(
axis.text = element_text(colour = "black"),
title = element_text(colour = "black", size = 12, face = "bold"),
legend.position = "none")
# Incidence plot:
plot_inc <-
ggplot(data = data_demo, aes(x = trt, y = inc)) +
geom_boxplot(outlier.alpha = 0)+
geom_jitter(aes(fill = trt), shape = 21, width = 0.2, size = 4)+
scale_fill_npg() +
labs(title = "Incidence data", y = "Incidence (%)", x = NULL)+
theme(
axis.text = element_text(colour = "black"),
title = element_text(colour = "black", size = 12, face = "bold"),
legend.position = "none")
# FDK plot:
plot_fdk <-
ggplot(data = data_demo, aes(x = trt, y = fdk)) +
geom_boxplot(outlier.alpha = 0)+
geom_jitter(aes(fill = trt), shape = 21, width = 0.2, size = 4)+
scale_fill_npg() +
labs(title = "FDK data", y = "FDK (%)", x = NULL)+
theme(
axis.text = element_text(colour = "black"),
title = element_text(colour = "black", size = 12, face = "bold"),
legend.position = "none")
Now we’ll load in the patchwork package and use the
+ symbol to combine the yield and severity plots we just
created:
library(patchwork)
plot_yield+plot_sev
We can also use the / symbol to place one plot above the
other:
plot_yield/plot_sev
Using () allows us to create a hierarchical plotting
design. First, yield and severity plots are placed side-by-side, along
with incidence and FDK plots. Then, the first group (yield and severity)
are placed above the second group, creating a 2x2 grid.
# Perform operations within parentheses/brackets, then operations outside:
(plot_yield+plot_sev) /
(plot_inc + plot_fdk)
## Warning: Removed 2 rows containing non-finite values (`stat_boxplot()`).
## Warning: Removed 2 rows containing missing values (`geom_point()`).
We can further modify this resulting figure with tags and by altering
theme settings. To create tags, we use patchwork function
plot_annotation.
# Recreate the previous plot:
(plot_yield+plot_sev) /
(plot_inc + plot_fdk) +
# Tag plots with 'A', 'B', 'C', 'D'
plot_annotation(tag_levels = "A",
title = "Demo data") &
theme(
axis.text.x = element_text(colour = "red")) # Note that '&' is needed to add theme elements here. This is because we are adding a theme layer on top of a 'patchwork' package layer instead of another 'ggplot2' layer.
## Warning: Removed 2 rows containing non-finite values (`stat_boxplot()`).
## Warning: Removed 2 rows containing missing values (`geom_point()`).
Finally, we can use the ggplot2 function
ggsave to save our plots. We can use ggsave()
save plots individually, or in patchwork combinations like
we did in the ‘Combining Plots’ section.
# Recreate the previous plot as 'plot_comb' object:
plot_comb <- (plot_yield+plot_sev) /
(plot_inc + plot_fdk) +
# Tag plots with 'A', 'B', 'C', 'D'
plot_annotation(tag_levels = "A",
title = "Demo data") &
theme(
axis.text.x = element_text(colour = "red"))
# Save the plot:
ggsave(plot = plot_comb, filename = "figures/intro_r/plot_comb.png")
## Saving 8 x 8 in image
## Warning: Removed 2 rows containing non-finite values (`stat_boxplot()`).
## Warning: Removed 2 rows containing missing values (`geom_point()`).
# We can add arguments 'width', 'height', 'units', and more to 'ggsave()' to manage our image quality:
ggsave(plot = plot_comb, filename = "figures/intro_r/plot_comb_with_args.png",
width = 300,
height = 280,
units = "mm",
dpi = 450)
## Warning: Removed 2 rows containing non-finite values (`stat_boxplot()`).
## Removed 2 rows containing missing values (`geom_point()`).
Below is how our two saved plots look:
Our two resulting plots with differing ggsave() arguments
Our two resulting plots with differing ggsave() arguments
Notice how different these plots look. This is because the image
saved with ggsave() arguments specified has different
dimensions than the image saved with default ggsave()
values. We can alter these arguments to change the resolution of these
images (for example, increasing resolution for images that will be
printed on large posters, or reducing the size of images that will be
uploaded/downloaded often). The use of these arguments can influence the
size of the text, individual data points, etc., allowing for further
control of the image.
Now that we know the basics for building and saving plots with
ggplot and its extension packages, we can focus on other
topics in plant epidemiology. Please see the navigation menu at the top
of the page to learn more about experimental design, sampling methods,
and different analyses you can do in R.